fix(providers/common-ai): LlamaIndexEmbeddingOperator always returns … by bramhanandlingala · Pull Request #68434 · apache/airflow

bramhanandlingala · 2026-06-12T07:29:03Z

Summary:
LlamaIndexEmbeddingOperator.execute() always returned "vector": None for every chunk in the output. The root cause is that VectorStoreIndex._get_node_with_embedding() calls node.model_copy() and assigns the embedding on the copy — the original node objects are never mutated. This has been the behavior across all tested llama-index-core versions (v0.10–v0.14).

Fix:
Pre-embed all nodes by calling embed_model.get_text_embedding_batch() and assigning the resulting vectors onto the original node objects before passing them to VectorStoreIndex. Since embed_nodes() inside VectorStoreIndex skips nodes whose .embedding is already populated, this produces no duplicate embedding API calls.

Testing:
Manually reproduced the bug with a mock stub mirroring llama-index's model_copy() behavior
Verified the fix produces correct non-None vectors for all chunks
No changes to public API or operator interface

Related: Fixes #68416

…vector=None

ashb

This needs unit tests please

kaxil · 2026-06-12T14:38:55Z

+        # ``embed_nodes()`` inside VectorStoreIndex skips nodes whose
+        # ``.embedding`` is already set, so pre-embedding causes no duplicate
+        # API calls.
+        texts = [node.get_content() for node in nodes]


get_content() with no argument is MetadataMode.NONE, but llama-index's own embed_nodes() embeds node.get_content(metadata_mode=MetadataMode.EMBED), which includes node metadata and respects excluded_embed_metadata_keys. Checked on llama-index-core 0.14.22: for a node with metadata={"src": "a"} the library embeds 'src: a\n\nhello world' while this embeds just 'hello world'. Since the operator attaches user metadata to every Document and the pre-set embeddings are reused for the persisted index, metadata stops contributing to the vectors entirely. Suggest metadata_mode=MetadataMode.EMBED here so the pre-embed step keeps the same embedding semantics llama-index applies itself.

kaxil · 2026-06-12T14:38:55Z

+        # ``.embedding`` is already set, so pre-embedding causes no duplicate
+        # API calls.
+        texts = [node.get_content() for node in nodes]
+        vectors = embed_model.get_text_embedding_batch(texts, show_progress=False)


This breaks test_byo_embed_model_bypasses_hook: _byo_embedding() builds its mock with spec=["get_text_embedding", "_get_query_embedding"], so calling .get_text_embedding_batch on it raises AttributeError. The duck-type check in _resolve_embed_model has the same gap, it accepts anything with those two methods but execute now needs a third. Adding get_text_embedding_batch to both the spec and the check would cover it, along with the tests ashb asked for on the new pre-embed path.

fix(providers/common-ai): LlamaIndexEmbeddingOperator always returns …

291ca59

…vector=None

bramhanandlingala requested review from gopidesupavan and kaxil as code owners June 12, 2026 07:29

boring-cyborg Bot added area:providers provider:common-ai labels Jun 12, 2026

ashb reviewed Jun 12, 2026

View reviewed changes

kaxil reviewed Jun 12, 2026

View reviewed changes

kaxil mentioned this pull request Jun 13, 2026

Fix LlamaIndexEmbeddingOperator returning vector=None for every chunk #68491

Merged

kaxil closed this in #68491 Jun 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(providers/common-ai): LlamaIndexEmbeddingOperator always returns …#68434

fix(providers/common-ai): LlamaIndexEmbeddingOperator always returns …#68434
bramhanandlingala wants to merge 1 commit into
apache:mainfrom
bramhanandlingala:fix/#68416

bramhanandlingala commented Jun 12, 2026

Uh oh!

ashb left a comment

Uh oh!

kaxil Jun 12, 2026

Uh oh!

kaxil Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

bramhanandlingala commented Jun 12, 2026

Uh oh!

ashb left a comment

Choose a reason for hiding this comment

Uh oh!

kaxil Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

kaxil Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants